Classification structurée pour l'apprentissage par renforcement inverse
Identifieur interne : 001692 ( Main/Exploration ); précédent : 001691; suivant : 001693Classification structurée pour l'apprentissage par renforcement inverse
Auteurs : Edouard Klein [France] ; Bilal Piot [France] ; Matthieu Geist [France] ; Olivier Pietquin [France]Source :
- Revue d'intelligence artificielle [ 0992-499X ] ; 2013.
Descripteurs français
- Pascal (Inist)
- Wicri :
- topic : Classification, Politique, Automobile.
English descriptors
- KwdEn :
Abstract
This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclasse classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000065
- to stream PascalFrancis, to step Curation: 000942
- to stream PascalFrancis, to step Checkpoint: 000037
- to stream Main, to step Merge: 001708
- to stream Main, to step Curation: 001692
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="fr" level="a">Classification structurée pour l'apprentissage par renforcement inverse</title>
<author><name sortKey="Klein, Edouard" sort="Klein, Edouard" uniqKey="Klein E" first="Edouard" last="Klein">Edouard Klein</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>LORIA - équipe ABC</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Nancy</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Piot, Bilal" sort="Piot, Bilal" uniqKey="Piot B" first="Bilal" last="Piot">Bilal Piot</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3"><inist:fA14 i1="03"><s1>UMI 2958 (GeorgiaTech-CNRS)</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Geist, Matthieu" sort="Geist, Matthieu" uniqKey="Geist M" first="Matthieu" last="Geist">Matthieu Geist</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Pietquin, Olivier" sort="Pietquin, Olivier" uniqKey="Pietquin O" first="Olivier" last="Pietquin">Olivier Pietquin</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3"><inist:fA14 i1="03"><s1>UMI 2958 (GeorgiaTech-CNRS)</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">13-0216741</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0216741 INIST</idno>
<idno type="RBID">Pascal:13-0216741</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000065</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000942</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000037</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000037</idno>
<idno type="wicri:doubleKey">0992-499X:2013:Klein E:classification:structuree:pour</idno>
<idno type="wicri:Area/Main/Merge">001708</idno>
<idno type="wicri:Area/Main/Curation">001692</idno>
<idno type="wicri:Area/Main/Exploration">001692</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="fr" level="a">Classification structurée pour l'apprentissage par renforcement inverse</title>
<author><name sortKey="Klein, Edouard" sort="Klein, Edouard" uniqKey="Klein E" first="Edouard" last="Klein">Edouard Klein</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>LORIA - équipe ABC</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Nancy</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Piot, Bilal" sort="Piot, Bilal" uniqKey="Piot B" first="Bilal" last="Piot">Bilal Piot</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3"><inist:fA14 i1="03"><s1>UMI 2958 (GeorgiaTech-CNRS)</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Geist, Matthieu" sort="Geist, Matthieu" uniqKey="Geist M" first="Matthieu" last="Geist">Matthieu Geist</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Pietquin, Olivier" sort="Pietquin, Olivier" uniqKey="Pietquin O" first="Olivier" last="Pietquin">Olivier Pietquin</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3"><inist:fA14 i1="03"><s1>UMI 2958 (GeorgiaTech-CNRS)</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Metz</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Classification</term>
<term>Data structure</term>
<term>Direct problem</term>
<term>Heuristic method</term>
<term>Internal structure</term>
<term>Inverse problem</term>
<term>Learning algorithm</term>
<term>Motor car</term>
<term>Parameterization</term>
<term>Policy</term>
<term>Reinforcement learning</term>
<term>Reward</term>
<term>Simulator</term>
<term>Vehicle driving</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Classification</term>
<term>Structure donnée</term>
<term>Apprentissage renforcé</term>
<term>Paramétrisation</term>
<term>Simulateur</term>
<term>Récompense</term>
<term>Politique</term>
<term>Automobile</term>
<term>Conduite véhicule</term>
<term>Structure interne</term>
<term>Algorithme apprentissage</term>
<term>Problème inverse</term>
<term>Problème direct</term>
<term>Méthode heuristique</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Classification</term>
<term>Politique</term>
<term>Automobile</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclasse classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement><li>Metz</li>
<li>Nancy</li>
</settlement>
</list>
<tree><country name="France"><region name="Grand Est"><name sortKey="Klein, Edouard" sort="Klein, Edouard" uniqKey="Klein E" first="Edouard" last="Klein">Edouard Klein</name>
</region>
<name sortKey="Geist, Matthieu" sort="Geist, Matthieu" uniqKey="Geist M" first="Matthieu" last="Geist">Matthieu Geist</name>
<name sortKey="Klein, Edouard" sort="Klein, Edouard" uniqKey="Klein E" first="Edouard" last="Klein">Edouard Klein</name>
<name sortKey="Pietquin, Olivier" sort="Pietquin, Olivier" uniqKey="Pietquin O" first="Olivier" last="Pietquin">Olivier Pietquin</name>
<name sortKey="Pietquin, Olivier" sort="Pietquin, Olivier" uniqKey="Pietquin O" first="Olivier" last="Pietquin">Olivier Pietquin</name>
<name sortKey="Piot, Bilal" sort="Piot, Bilal" uniqKey="Piot B" first="Bilal" last="Piot">Bilal Piot</name>
<name sortKey="Piot, Bilal" sort="Piot, Bilal" uniqKey="Piot B" first="Bilal" last="Piot">Bilal Piot</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001692 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001692 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:13-0216741 |texte= Classification structurée pour l'apprentissage par renforcement inverse }}
This area was generated with Dilib version V0.6.33. |